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This  report  summarized  the  research  conducted  under  DAP.PA  Contract 
No.  MDAO03-81-C-O2O3.  For  this  contract  we  examined- validation 
topics 'through  both  a  real-world  and  laboratory  setting.  In 
the  laboratory  experiments  we  taught  subjects  additive  and  multi¬ 
plicative  value  functions  via  outcome  feedback.  We  found  that 
standard  MAUM  procedures  recovered  the  taught  functions.  We  also 
found  behavioral  differences  between  value  and  utility  elicitation 
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gauge  the  validity  of  alternative  multiattribute  utility  elicita¬ 
tion  techniques.  The  results  indicate  that  subjects  learned  the 
value  functions  very  well,  indeoendently  of  whether  the  nroblem 
involved  two  or  four  attributes,  equal  or  unequal  weights, 
additive  or  multiplicative  functions.  Riskless  value  and  risky 
utility  elicitation  methods  were  able  to  identify  the  structural 
properties  of  the  taught  models  (additive  vs.  multiplicative, 
sign  of  the  interaction  parameter)  quite  well,  although  risky 
methods  generated  a  tendency  towards  nultiattribute  risk  aversion 
in  additive  models.  Furthermore,  for  the  simole  models  (e.g. 
additive,  equal  weight,  and  multiplicative  equal  weight),  the 
standard  elicitation  methods  were  able  to  recapture  the  taught 
model  parameters  quiite  well.  The  ability  of  multiattribute 
utility  techniques  to  recover  value  functions  decreased,  however, 
when  models  became  very  complex  (e.g.  multiolicative  unequal 
weights) . /f? In  these  cases  simple  methods  like  ratio  weighting 
and  a  hybrid  combination  of  methods  outperformed  the  "formally 
correct"  methods. 
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VALIDATION  OF  MULT I ATTRIBUTE  UTILITY  PROCEDURES 
Introduction 

The  experiments  of  this  period  were  concerned  with  validation  of 
decision  analytic  tools  in  both  a  real-world  and  a  laboratory  setting. 
Our  previous  work  has  suggested  that  in  many  instances  decision 
analytic  procedures  can  improve  human  performance,  and  that 
simplification  of  the  technology  will  extend  its  potential  uses.  Thus 
a  major  thrust  in  past  research  has  been  on  simplification  with 
concomitant  validation  of  the  simplified  tools.  The  work  of  this 
contract  period  has  expressly  tested  the  limits  of  our  previous  work 
and  suggests  areas  where  more  validation  and/or  exploration  is  needed 
and  potential  situations  where  simplification  should  be  approached 
rather  more  cautiously  than  was  previously  thought. 

Two  studies  were  performed.  The  first  consisted  of  two 
laboratory  experiments,  which  used  an  extension  of  the  Multiple  Cue 
Learning  Paradigm  (MCPL)  developed  in  this  laboratory  to  pit  several 
different  decision  analytic  techniques  against  each  other  across 
conditions  with  different  underlying  "true"  structural  models.  The 
other  study  applied  Edwards's  Simple  Mult iAttr ibute  Rating  Technique 
(SMART)  (Edwards,  1977;  Edwards  and  Newman,  1982)  ) ,  to  a  complex  real 
world  evaluation  problem.  This  report  will  summarize  the  findings  and 
lessons  of  the  work.  For  more  detailed  descriptions  see  Griffin  and 
ddwards  (1982)  and  John  and  von  Winterfeldt  (1982). 

Teaching  and  Recovering  Additive  and  Multiplicative  Value  Functions 

As  a  technology  matures  one  expects  the  precision  of  its 
application  to  be  refined,  subtleties  to  be  explored,  and  in  general. 
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the  implications  of  theory  and  the  practicalities  of  application  to  be 
more  fully  integrated.  In  Multiattribute  Utility  measurement  ( MAUM ) 
such  refinements  include  an  exploration  of  the  measurement  theoretic 
bases  of  the  model  (e.g.  risky  vs.  riskless  models  or,  "utility"  vs. 
"value");  the  structural  form  of  the  multiattribute  model  (e.g. 
additive  vs.  multiplicative);  and  the  functional  forms  of  the  single¬ 
attribute  value  or  utility  functions  (e.g.  linear  vs.  non-linear). 

The  two  experiments  of  the  laboratory  study  (reported  by  John 
and  von  Winterfeldt,  1982)  compared  several  assessment  techniques, 
which  varied  on  the  above  dimensions  across  a  range  of  "true"  riskless 
multiattribute  structures.  We  were  primarily  interested  in  the 
validity  with  which  simple  methods  and  models  could  recover  complex 
"true"  value  structures.  The  unexpected  results  of  these  experiments 
answer  questions  about  the  validity  of  structures  that  had  previously 
not  been  addressed. 

Three  different  assessment  techniques  were  used  in  this  study. 
Two  of  them  arise  out  of  formal  measurement-theoretic  models  for 
quantifying  preference.  One  of  the  methods  is  formally  appropriate 
for  eliciting  value  functions  and  structures;  that  is,  models  of 
preference  modeled  without  risk.  The  other,  a  utility  method,  is 
formally  appropriate  for  modeling  risky  choice.  Our  final  assessment 
technique  is  of  the  kind  proposed  by  Edwards  (1977)  and  others,  which 
is  not  based  upon  strict  measurement  theory,  but  upon  the  psychology 
of  numerical  estimates  of  subjective  quantities:  magnitude  scaling. 
This  technique  follows  the  logic  of  value  or  utility  theory  but 
includes  judgments  that  have  no  strict  measurement  theoretic 
justification,  and  uses  additive  models  without  formal  independence 
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checks . 

Edwards  and  his  colleagues  (see  Edwards,  1977;  Edwards  and 
Newman,  1982)  have  argued  that: 

1.  Additive  aggregation  rules  are  good  approximations  to  nonadditive 
(e.g.,  multiplicative)  rules; 

2.  Linear  single-attribute  value  functions  are  good  approximations  to 
non-linear  (e.g.,  exponential)  forms; 

3.  Questions  about  strengths  of  preference  and/or  gambles  are 

difficult  for  respondents  to  understand;  whereas  ratings  on 

attributes  (location  measures)  and  judgments  of  relative 
importance  (weights)  are  more  intuitive; 

4.  The  lack  of  an  error  theory  in  both  value  and  util i ty  -  rement 

raises  the  possibility  that  more  complex  models  of  preference  are 
more  susceptible  to  "random"  errors  that  could  lead  to  greater 
overall  error  thair  found  with  (structurally  incorrect) 

atheoretical  rating  scale  models. 

Previous  work  (both  theoretical  and  empirical)  in  this  research 
program  has  demonstrated  that  these  assertions  are  often  valid. 

The  unique  feature  of  this  study  was  to  compare  these  three 
techniques  across  a  variety  of  additive  and  nonadditive  models.  To  do 
this  subjects  were  "taught"  value  models  through  outcome  feedback. 
Across  conditions  the  number  of  attributes  was  varied  as  was  the 
structural  form  of  the  model  (i.e.  additive  vs.  multiplicative),  and 
the  "strength"  (in  the  multiplicative  conditions)  of  the  interaction 
term. 
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Experiment  I 

Twenty  undergraduates  were  taught  one  of  five  different  two- 
attribute  models  of  diamond  worth.  Each  subject  saw  100  "diamond 
profiles"  on  a  video  display,  estimated  the  value  of  each  diamond,  and 
received  outcome  feedback  about  the  "actual"  value  of  the  diamond. 

Models  taught  varied  the  trade-off  between  "quality"  and  "size." 
Trade-offs  were  either  additive  or  multiplicative,  and  multiplicative 
models  were  either  complementing  or  substituting.  For  the  additive 
models  the  weight  ratios  (trade-offs)  were  either  4:1  or  1:1. 
Complementing  models  used  either  2:1  or  1:1  weights  and  the 
substituting  models  used  1:1  weights.  All  five  models  used  single¬ 
attribute  value  functions  linear  in  "quality"  and  "size,"  the  two 
variables  comprising  each  diamond  profile. 

Following  training,  each  subject  met  with  one  of  two  analysts 
who  knew  nothing  about  what  model  the  subject  had  been  taught. 
Analysts  guided  subjects  through  a  series  of  questions  about  critical 
value-differences,  direct  subjective  estimates  cf  "importance  weight" 
ratios,  and  gamble  indifferences  of  two  kinds,  basic  reference  lottery 
tickets  (BRLTS)  and  certainty  equivalents. 

Models  of  each  subject's  judgments  were  constructed  based  on  the 
analyst's  session  and  the  last  50  estimates  the  subjects  gave  during 
the  computer  session.  Four  multiattribute  models  were  constructed 
from  the  analyst-session  judgments.  A  multiplicative  value  model 
assuming  linear  single-attribute  value  functions  was  constructed  from 
value  difference  judgments.  Two  "importance  weight"  models  (one 
additive  and  one  multiplicative)  were  constructed  from  the  importance 
weight  judgment  and  (in  the  multiplicative  case)  one  value  judgment. 
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And  finally,  a  utility  model,  assuming  linear  single-attribute  utility 
functions,  was  constructed  from  lottery  judgments.  The  bootstrapped 
models  based  on  the  last  50  trials  of  the  computer  session  included 
both  an  additive  and  multiplicative  functional  form  model. 

This  experiment  demonstrated  that  subjects  could  learn  both 
additive  and  nonadditive  trade-off  relations,  and  that  these  newly 
acquired  value  structures  could  be  successfully  discovered  via 
standard  multiattribute  value  and  utility  assessment  procedures. 

However,  these  general  positive  findings  on  the  validity  of  the 
paradigm  and  the  assessment  techniques  have  to  be  tempered  with 
specifics.  For  instance,  we  found  the  value  and  utility  models  to  be 
an  improvement  over  the  additive  importance-weight  models  when  the 
taught  model  was  multiplicative.  In  contrast,  when  the  model  taught 
was  additive  the  elicited  value  and  utility  models  did  not  tend  to 
capture  that  additivity  (i.e.  the  interaction  term  was  non-zero). 
Teaching  unequal  weights  models  decreased  the  performance  of  the 
elicited  models,  particularly  so  for  the  utility  models. 

All  of  these  findings  apply  to  the  somewhat  restricted  two- 
attribute  case.  The  second  experiment  attempted  to  replicate  our 
findings  using  four  attributes. 


Experiment  II 

Ten  undergraduates  were  taught  one  of  five  different  four- 
attribute  models  of  diamond  worth.  The  training  procedure  was  similar 
to  that  in  Experiment  I,  except  that  diamonds  were  described  in  terms 
of  the  four  Cs ;  cut,  color,  clarity,  and  carast.  Just  as  in 
Experiment  I,  true  models  were  either  additive,  complementing,  or 
substituting.  Weights  for  additive  and  complementing  models  were 
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either  all  equal,  or  in  the  ratio  4 : 3 : 2 : 1 .  Only  3n  equal  weights 
substituting  model  was  used.  Following  computer  training  sessions 
subjects  went  through  an  analyst  session  where  the  same  types  of 
judgments  as  in  Experiment  I  were  elicited. 

The  results  of  the  computer  sessions  replicated  the  finding  of 
Experiment  I  that  subjects  could  learn  both  additive  and  non-additive 
trade-off  relations  in  the  more  general  four-attribute  case.  From  the 
elicitation  sessions  we  found  that  complementing  and  substituting 
models  were  recoverable  (that  is  the  elicited  interaction  term  was 
non-zero  and  appropriately  signed).  However,  utility-based  models 
showed  a  marked  shift  towards  substitution,  for  which  one 
interpretation  is  risk  aversion. 

Weights  were  not  well  recovered  by  any  of  the  techniques.  In 
general,  though,  value  models  produced  the  steepest  weights  and 
utility  procedures  produced  the  flattest  weights.  As  in  the  two 
attribute  experiment,  non-linear  single  attribute  functions  were  not 
recovered . 

Summary  and  conclusions 

The  most  important  and  clearest  findings  of  the  study  were  that 
multiplicative  (as  well  as  additive)  trade-off  structures  can  be 
learned  through  outcome  feedback  and  reliably  recovered  using  standard 
value  and  utility  assessment  techniques.  We  found  this  in  both  the 
two-  and  four-attribute  experiments.  The  multiplicative  models  were 
typically  "better"  than  the  additive  importance-weight  models  when  a 
multiplicative  model  was  taught. 

From  the  experiments  we  conclude  that: 

1)  Subjects  can  learn  additive  and  multiplicative  value  models  via 
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outcome  feedback  and  that  functional  form  of  the  taught  model  is 
recoverable  through  standard  value  and  utility  assessment 
techniques ; 

2)  Distinctions  among  value,  utility,  and  importance  weight 

elicitation  techniques  are  behaviorally  observable. 

Two  explanations  of  the  consistent  differences  between  the  value 
and  utility  model  compete.  Such  differences  can  be  primarily 
consequences  of  a  consistent  response  mode  bias,  or  of  a 

psychologically  valid  distinction  between  the  two  techniques.  We 
believe  that  response  mode  bias  causes  the  steepened  weights.  But  we 
do  not  know  why  utility  models  produce  more  substituting  (or  risk 
averse)  models.  More  research,  perhaps  with  taught  utility  models,  is 
indicated. 

Although  subjects  were  clearly  able  to  learn  and  reproduce 
multiplicative  models  using  the  standard  assessment  procedures,  the 
practical  implications  of  this  finding  are  not  clear.  An  assessed 
multiplicative  model  will  perform  better  than  an  additive  one  when  the 
true  model  is  multiplicative,  but  how  much  better  is  equivocal. 
"Better"  can  be  defined  by  several  different  measures  of  agreement. 
More  important,  the  significance  of  the  improvement  is  highly 
dependent  upon  the  decision  problem  at  hand. 

The  three  primary  variables  controlling  model  agreement  that  may 
vary  from  one  problem  context  to  another  are: 

1.  The  multivariate  distribution  of  alternatives  along  attributes; 

2.  The  choice  problem,  e.g.,  choose  the  one  best  alternative,  choose 
the  best  X% ,  rank  order  all,  etc.,  and 

3.  The  standard  against  which  differences  in  actual  obtained  value 
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(utility)  is  to  be  compared. 

Furthermore,  different  measures  of  agreement  make  different 
implicit  assumptions  about  these  variables.  And  again,  an  increment 
in  model  agreement  (as  measured  by  a  correlation  coefficient)  is  still 
dependent  upon  the  context.  A  .04  increment  may  translate  into 
pennies  or  thousands  of  dollars  depending  on  the  particular  context. 

"SMART”  Models 

Our  second  study  attempted  to  replicate  and  extend  the  findings 
of  Stillwell's  (1980)  work  with  bank  loan  officers  and  MAUM  methods. 
Stillwell  used  credit  card  applications  as  stimuli.  Outcome 
information  was  available;  a  large-scale  empirically  based 
discriminant  analysis  model  classified  the  applications  as  either 
"good"  or  "bad."  Stillwell's  study  compared  several  different  MAUM 
elicitation  techniques  and  one  holistically  based  decomposition 
technique.  He  concluded  that  all  of  the  decomposed  techniques  worked 
very  well  except  for  the  holistic  one.  He  concluded  that  ease  of 
application  should  be  a  major  determinant  of  the  selection  among 
decomposition  techniques. 

A  criticism  of  the  study  was  that  all  of  the  subjects  were 
familiar-  with  the  empirical  discriminant  model  before  the  experiment. 
Therefore,  the  officers  might  have  simply  been  reproducing  the 
parameters  of  a  model  they  already  knew. 

A  better  real-world  study  would  incorporate  outcome  information 
with  substantive  expertise  that  did  not  include  specific  knowledge 
about  a  decomposed  model.  The  current  study  attempted  to  do  this. 
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Subjects  and  Task 

Subjects  in  the  experiment  were  20  Loan  Examiners  working  for  a 
major  California  bank  in  its  Credit  Review  Department.  Credit  review 
functions  to  evaluate  the  quality  of  credit  that  the  bank  has  already 
granted.  It  is  independent  of  the  sales  department  and  is 
organizationally  a  part  of  the  office  of  the  controller  of  the  bank. 

The  task  was  to  construct  SMART  models  of  a  loan  examination. 
In  the  course  of  their  jobs,  loan  examiners  evaluate  already-granted 
credit  and  rate  it  based  on  a  large  amount  of  information  about  such 
things  as  financial  statements,  quality  of  management,  type  of 
industry,  economic  conditions,  etc.  The  loans  are  either  "criticized" 
or  "passed."  Criticized  loans  are  considered  to  be  serious  financial 
exposures  for  the  bank,  and  the  bank's  cash  reserves  in  part  depend 
upon  these  judgments.  We  had  access  to  a  data  bank  that  contained 
pass/criticize  outcome  information  based  on  the  entire  set  of  data 
typically  used  for  the  judgments,  and  end-of-year  financial  statements 
for  about  100  firms.  The  firms  were  all  mid-sized  wholesalers, 
retailers,  or  manufacturers.  The  data  base  was  about  evenly  split 
between  passed  and  criticized  loans,  although  in  the  population 
criticized  loans  for  firms  of  this  size  are  fairly  rare. 

Subjects  were  run  individually  by  experimenters  who  had 
decision-analytic  training.  Subjects  were  given  the  population  domain 
of  their  judgments.  All  of  the  subjects'  judgments  were  based  only 
upon  the  end-of-year  financial  statements.  Subjects  provided  20 
holistic  evaluations  of  ten  passed  and  criticized  loans  randomly 
selected  from  the  data  base.  Subjects  judged  whether  they  thought  the 
firm  should  be  criticized  or  passed  and  gave  an  anchored  numerical 
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judgment  (with  0  representing  certain  criticism,  500  complete 
uncertainty,  and  1000  certain  pass.) 

Next  the  subjects  completed  two  SMART  value  models  which  also 
used  only  financial  statement  variables.  In  one  of  the  models  the 
attributes  were  preselected  using  statistical  techniques.  In  the 
other,  subjects  selected  financial  variables  for  the  models.  Order  of 
the  SMART  models  was  counterbalanced. 

Time  constraints  did  not  allow  for  the  elicitation  of  single 
dimension  value  functions.  As  a  proxy,  we  used  z-score 
transformations  to  give  all  attributes  the  same  mean  and  standard 
deviation. 

Results 

The  basic  design  of  the  experiment  was  a  simple  three-condition 
within-sub jects  design.  The  primary  dependent  variable  was  the 
percentage  of  correct  classification.  For  the  SMART  models,  we 
individually  applied  each  model  to  the  bank's  database.  The  holistic 
judgments  were  made  on  cases  taken  from  the  database. 

Results  indicated  that  all  models  correctly  classified 
approximately  the  same  number  of  correct  cases  (around  701).  This  is 
fairly  comparable  to  the  rate  of  classification  that  is  produced  bv  a 
least-square  discriminant  model.  A  maximum  likelihood  logistic 
discriminant  model,  however,  produces  somewhat  better  estimates  (about 
75%  correct  classification). 

The  accuracy  of  the  SMART  models  did  not  appear  to  be  dependent 
upon  the  number  of  attributes  subjects  selected  for  their  models.  The 
number  of  attributes  in  the  self  selected  models  ranged  from  three  to 
nine,  with  a  mean  of  about  six.  The  SMART  models  with  pre-selected 
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attributes  had  five  attributes. 

Some  background  information  was  collected  on  each  of  the 
subjects.  This  data  indicated  a  marginal  tendency  for  experience  and 
age  to  produce  both  holistic  and  SMART  judgments  that  were  less 
accurate.  (Experience  was  measured  with  two  (related)  variables: 
number  of  years  working  at  the  particular  bank;  and  number  of  years 
working  for  any  financial  institution.) 

Conclusions . 

The  ad  hoc  nature  of  the  SMART  modeling  should  serve  to  qualify 
the  results.  Essentially,  the  SMART  models  had  everything  going 
against  them,  and  yet  performed  at  the  same  level  as  the  holistic  and 
close  to  the  level  of  the  more  elaborate  statistical  models.  Making 
holistic  judgments  based  on  financial  statements  is  a  task  that  all  of 
the  loan  examiners  have  substantial  expertise  in.  No  subjects,  so  far 
as  we  were  able  to  tell,  had  any  experience  with  MAUM. 

Furthermore  the  time  constraints  prevented  elaborate 
structuring,  or  the  elicitation  of  more  precise  single  dimension  value 
functions.  Had  we  structured  more  fully,  taking  into  account,  for 
instance,  the  specific  nature  of  the  businesses,  we  almost  certainly 
would  have  increased  the  accuracy  of  classification.  From  an 
application  perspective,  this  has  a  possible  application.  Since  these 
very  simple  decomposed  judgment  based  models  did  just  about  as  well  as 
the  statistical  models  based  on  the  large  data-base,  such  models  might 
be  useful  as  a  "red  flag"  system  in  other  situations  where  a  data  base 
is  not  readily  available.  (The  current  bank  data-base  required 
searching  seven  years  of  records.) 

For  validation  purposes  the  results  are  encouraging  but  not 
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conclusive.  Taken  with  the  results  of  Stillwell's  study  they  point  to 
further  work.  In  Stillwell's  study  the  structure  and  selection  of 
attributes  was  predetermined.  The  bank  spent  both  large  amounts  of 
time  and  money  determining  the  attributes,  in  addition  to  the 
computational  work  involved  in  the  discriminant  analysis.  The  current 
study  had  to  rely  on  fairly  "quick  and  dirty"  structuring.  SMART 
provided  a  high  level  of  accuracy  in  Stillwell's  study,  and  a  moderate 
level  in  the  current  one.  Since  a  least -squares  discriminant  model 
also  provides  a  lower  level  of  accuracy,  it  is  probably  the  case  that 
the  structuring  is  at  fault.  Further  work  on  structuring  is  needed 
for  this  particular  type  of  application.  Further  work  is  needed  in 
general  on  structuring  for  scientific  purposes. 

We  feel  the  finding  that  experience  and  age  tended  to  marginally 
produce  less  accuracy  in  the  holistic  and  decomposed  judgments  to  be 
more  a  function  of  recalcitrance  on  the  part  of  the  older  subjects 
rather  than  a  true  cognitive  deficit  in  the  ability  to  implement  MAUM 
or  make  the  holistic  judgments.  Older  subjects  tended  to  express 
reluctance  in  making  any  of  the  judgments  based  simply  on  the  one  year 
of  financial  statements. 

IV.  Conclusions 

Our  studies  on  validating  multiattribute  utility  techniques  have 
taken  a  two-pronged  approach:  real  world  validation  in  settings  that 
have  an  outcome  criterion,  and  laboratory  validation  with  teaching  and 
recovering  value  functions  using  standard  MAUM  assessment  techniques. 
The  two  experiments  reported  in  this  final  report  reflected  this 
validation  strategy,  and  their  results  fit  into  an  emerging  story  of 
the  validity  of  MAUM. 
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The  first  part  of  the  story  is  about  simplicity.  Our  results 
indicate  that  simple  approximate  models  and  assessment  techniques  do 
as  well  or  better  than  complex  ones  if,  in  fact,  the  ’’true”  model  is 
simple.  This  was  a  result  of  Stillwell  (1981)  as  well  as  of  John  et 
al.  (1982)  and  it  is  replicated  in  the  present  MCPL  experiment.  If 
the  true  models  become  more  complex,  the  more  complicated  elicitation 
techniques  show  some  improvement  over  the  simple  approximation 
techniques.  This  result  was  especially  obvious  in  analyzing  taught 
multiplicative  value  functions  in  the  MCPL  study.  Surprisingly, 
however,  if  the  true  model  is  very  complex,  simple  assessment 
techniques  appear  to  do  relatively  well  again. 

The  second  part  of  the  emerging  validation  story  is  about  error. 
No  multiattribute  utility  model  or  technique  is  perfect.  In  the  MCPL 
studies,  we  were  struck  by  the  lack  of  ability  of  the  MAU  techniques 
to  recover  the  weights  in  complex  four-attribute  value  functions. 
Furthermore,  while  the  model  form  could  usually  be  identified,  the 
elicited  interaction  parameters  were  frequently  quite  far  off  the  true 
ones.  Finally,  in  the  bank  study,  the  SMART  model  did  not  show 
exceptional  performance  in  classifying  criticisms  of  bank  loans 
correctly.  The  fault  in  both  studies  may  not  lie  so  much  with  the 
models  as  with  the  complexity  of  the  underlying  structures.  In  the 
MCPL  study  accuracy  degraded  with  the  complexity  of  the  model 
structure  (multiplicative,  unequal  weights);  in  the  bank  study  the 
complexity  was  introduced  by  the  real  world  problem  itself  --  the 
structure  imposed  was  probably  much  too  simple  to  capture  that 
complexity.  Complexities  in  real  world  structures  and  in  the  models 
applied  to  them  seem  to  harm  MAUM  techniques. 
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The  lessons  of  our  validation  studies  suggest  two  strategies  for 
coping  with  complex  structures  and  models.  The  first  would  attempt  to 
reduce  the  complexity  in  the  problem  structures,  essentially  by  a 
continued  search  for  simple  and  independent  sets  of  attributes  that 
lend  themselves  to  more  additive  modeling.  The  second  strategy  is  to 
increase  model  complexity  --  up  to  a  point.  If  there  are  reasons  to 
believe  that  the  underlying  preferences  are  non-additive,  and  if  the 
deviations  from  additivity  are  not  gross  or  extreme,  and  if 
restructuring  does  not  help,  then  one  should  probably  attempt  somewhat 
more  complex  models  and  assessment  techniques.  But  our  results 
suggest  reversion  to  simple  approximations  if  the  structures  of 
underlying  preference  forms  become  overly  complex.  In  those  cases  the 
more  complex  elicitation  forms  are  unlikely  to  detect  correctly  the 
subtleties  of  the  complex  realities,  and,  worst  yet,  may  lead  the 
analysis  further  astray. 
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